“A/B testing (also known as bucket tests or split-run testing) is a randomized experiment with two variants, A and B. It includes application of statistical hypothesis testing or”two-sample hypothesis testing" as used in the field of statistics. A/B testing is a way to compare two versions of a single variable, typically by testing a subject’s response to variant A against variant B, and determining which of the two variants is more effective."
Source: Wikipedia
Watch the video to understand the basics of A/B testing (also called split testing).
Randomized controlled experiment is vastly used by pharmaceutical companies, medical scientists, agricultural research, among others.
In summary, “A/B testing can be a randomized controlled experiment, assuming you’ve controlled factors and randomized subjects, but not all randomized controlled experiments are A/B tests.”
A/B testing is used to determine effects of digital marketing effort, specially because in this industry, small changes can have big effects.
To run an A/B Testing, it is necessary:
The following describes the basic flow of a scientific step-by-step process that you can use for A/B Split Testing.
In hypothesis testing there are three possible outcomes of the test:
With no error everything is clear.
Type I error (beware! this is a really serious error) occurs when you incorrectly reject the null hypothesis and conclude that there is actually a difference between the original page and the variation when there really isn’t. In other words, you obtain false positive test results. Like the name indicates, a false positive is when you think one of your test challengers is a winner while in reality it is not.
Type II error occurs when you fail to reject the null hypothesis at the right moment, obtaining this time false negative test results. Type II error occurs when we conclude test with the assumption that none of the variations beat the original page while in reality one of them actually did.
Type I and type II errors cannot happen at the same time:
Keep in mind that statistical errors are unavoidable.
However, the more you know how to quantify them the more you get accurate results.
When conducting hypothesis testing, you cannot “100%” prove anything, but you can get statistically significant results.
Source: A/B Testing Statistics Made Simple
Now, that we’ve learned what A/B testing is, let’s do some A/B testing in R.
For this exercise, we are going to use the material and the ‘click_data.csv’ dataset used by DataCamp in the free Chapter of the A/B Testing in R course.
The dataset will be a generated example of a cat adoption website. You will investigate if changing the homepage image affects conversion rates (the percentage of people who click a specific button).# Load the libraries
library(tidyverse)
library(data.table)
# Read in data
click_data <- fread('https://assets.datacamp.com/production/repositories/2292/datasets/4407050e9b8216249a6d5ff22fd67fd4c44e7301/click_data.csv')
click_data
## visit_date clicked_adopt_today
## 1: 2017-01-01 1
## 2: 2017-01-02 1
## 3: 2017-01-03 0
## 4: 2017-01-04 1
## 5: 2017-01-05 1
## ---
## 3646: 2017-12-27 1
## 3647: 2017-12-28 0
## 3648: 2017-12-29 0
## 3649: 2017-12-30 1
## 3650: 2017-12-31 0
Let’s find oldest and most recent date
min(click_data$visit_date)
## [1] "2017-01-01"
max(click_data$visit_date)
## [1] "2017-12-31"
Now that we know we have one year of data, let’s determine our baseline conversion rates.
What ‘more’ means in this context?
Compared the conversion rate to when?
Calculate the mean conversion rate by month
library(lubridate)
click_data$month <- month(click_data$visit_date)
click_data_month <- click_data %>%
group_by(month) %>%
summarize(conversion_rate = mean(clicked_adopt_today))
click_data_month
## # A tibble: 12 x 2
## month conversion_rate
## <dbl> <dbl>
## 1 1 0.197
## 2 2 0.189
## 3 3 0.145
## 4 4 0.15
## 5 5 0.258
## 6 6 0.333
## 7 7 0.348
## 8 8 0.542
## 9 9 0.293
## 10 10 0.161
## 11 11 0.233
## 12 12 0.465
library(scales)
ggplot(click_data_month, aes(x=month, y=conversion_rate)) +
geom_point() +
geom_line() +
scale_y_continuous(labels = scales::percent, limits=c(0,1))
Calculate the mean conversion rate by day of the week
click_data$wday <- wday(click_data$visit_date)
click_data_wday <- click_data %>%
group_by(wday) %>%
summarize(conversion_rate = mean(clicked_adopt_today))
click_data_wday
## # A tibble: 7 x 2
## wday conversion_rate
## <dbl> <dbl>
## 1 1 0.3
## 2 2 0.277
## 3 3 0.271
## 4 4 0.298
## 5 5 0.271
## 6 6 0.267
## 7 7 0.256
ggplot(click_data_wday, aes(x=wday, y=conversion_rate)) +
geom_point() +
geom_line()+
scale_y_continuous(labels = scales::percent, limits=c(0,1))
Calculate the mean conversion rate by week of the year
click_data$week <- week(click_data$visit_date)
click_data_week <- click_data %>%
group_by(week) %>%
summarize(conversion_rate = mean(clicked_adopt_today))
click_data_week
## # A tibble: 53 x 2
## week conversion_rate
## <dbl> <dbl>
## 1 1 0.229
## 2 2 0.243
## 3 3 0.171
## 4 4 0.129
## 5 5 0.157
## 6 6 0.186
## 7 7 0.257
## 8 8 0.171
## 9 9 0.186
## 10 10 0.2
## # ... with 43 more rows
ggplot(click_data_week, aes(x=week, y=conversion_rate)) +
geom_point() +
geom_line() +
scale_y_continuous(labels = scales::percent, limits=c(0,1))
Based on the previous data analysis, we have our baseline numbers and we can determine for how long we should run our experiment.
But before start the experiment, we should ask some important questions:
Besides anwering the questions, is important to determine:
Now, let’s calculate the sample sizes using the package ‘powerMediation’.
Suppose we will run the experiment starting in January, we expect roughly a 20% conversion rate (p1), let’s assume that the test condition expect a convertion rate of 30% (p2).
For sample porportion (beta), alpha, and power we will use the most common values (0.5, 0.05, and 0.8, respectively)
library(powerMediation)
total_sample_size <- SSizeLogisticBin(p1 = 0.2,
p2 = 0.3,
B = 0.5,
alpha = 0.05,
power = 0.8)
total_sample_size
## [1] 587
total_sample_size/2
## [1] 293.5
Now is your turn.
How many data points you need in total across both conditions, if we decide to run the experiment in August? Make sure you round the percentages!
Let’s say you’ve reconsidered your expectations for running the experiment in August. Because, increasing the conversion rate by 10 percentage points may be difficult. Rerun your power analysis assuming only a 5 percentage point increase in your conversion rate for the test condition.
Now, we are going to use the exemple presented in the article Tips for A/B Testing with R. We will test the difference between two rates in R, e.g., click-through rates or conversion rates from two tested conditions.
library(readr)
# Specify file path:
dataPath <- "https://www.inwt-statistics.de/files/INWT/downloads/exampleDataABtest.csv"
# Read data
data <- read_csv(file = dataPath)
head(data)
## # A tibble: 6 x 3
## group time clickedTrue
## <chr> <dttm> <dbl>
## 1 A 2016-06-02 02:17:53 0
## 2 A 2016-06-02 03:03:54 0
## 3 A 2016-06-02 03:18:56 1
## 4 B 2016-06-02 03:23:43 0
## 5 A 2016-06-02 04:04:00 0
## 6 A 2016-06-02 04:34:53 0
# Inspect structure of the data
str(data)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1000 obs. of 3 variables:
## $ group : chr "A" "A" "A" "B" ...
## $ time : POSIXct, format: "2016-06-02 02:17:53" "2016-06-02 03:03:54" ...
## $ clickedTrue: num 0 0 1 0 0 0 0 0 0 0 ...
## - attr(*, "spec")=
## .. cols(
## .. group = col_character(),
## .. time = col_datetime(format = ""),
## .. clickedTrue = col_double()
## .. )
# Change type of group to factor
data$group <- as.factor(data$group)
# Change type of click through variable to factor
data$clickedTrue <- as.factor(data$clickedTrue)
levels(data$clickedTrue) <- c("0", "1")
str(data)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 1000 obs. of 3 variables:
## $ group : Factor w/ 2 levels "A","B": 1 1 1 2 1 1 2 2 2 1 ...
## $ time : POSIXct, format: "2016-06-02 02:17:53" "2016-06-02 03:03:54" ...
## $ clickedTrue: Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 1 ...
## - attr(*, "spec")=
## .. cols(
## .. group = col_character(),
## .. time = col_datetime(format = ""),
## .. clickedTrue = col_double()
## .. )
Let’s find oldest and most recent date
min(data$time)
## [1] "2016-06-02 02:17:53 UTC"
max(data$time)
## [1] "2016-06-10 01:11:15 UTC"
To test the difference between two proportions, you can also use the Pearson’s chi-squared test. For small samples you should use Fisher’s exact test instead.
The prop.test function returns a p-value and a confidence interval for the difference between the two rates.
# Compute frequencies and conduct test for proportions
# (Frequency table with successes in the first column)
freqTable <- table(data$group, data$clickedTrue)[, c(2,1)]
# print frequency table
freqTable
##
## 1 0
## A 20 480
## B 40 460
# Conduct significance test
prop.test(freqTable, conf.level = .95)
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: freqTable
## X-squared = 6.4007, df = 1, p-value = 0.01141
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.071334055 -0.008665945
## sample estimates:
## prop 1 prop 2
## 0.04 0.08
Based on the test result, with the significance level of 5%, we reject the null hypothesis (p-value = 0.01141), which means that there are statistical evidence that the condition A conversion rate differ from the tested condition (B).
Congratulations everyone!!!
I am so proud of you!
If you’d like to learn more about A/B Testing and/or A/B Testing in R, here some resources: